Supporting Information for: Phylogenetic generalised linear mixed-effects modelling with glmmTMB R package

Author

Coralie Williams, Maeve McGillycuddy, Szymek M. Drobniak, Ben M. Bolker, David I. Warton, Shinichi Nakagawa

Published

August 14, 2025

1 Overview

Code
knitr::opts_chunk$set(
  warning = FALSE,
  message = FALSE
)

pacman::p_load(
  rmarkdown, readr, dplyr, purrr, splitstackshape, tidyverse, tidyr, ggplot2,
  vroom, details, sessioninfo, devtools, readxl, forcats, stringr, ape, phytools, 
  geiger, data.table, pavo, MASS, glmmTMB, brms, MCMCglmm, phyr, INLA, broom.mixed,
  knitr, bayestestR, performance, rbenchmark, here, DHARMa, emmeans
)
#setwd("C:/Users/z5394590/OneDrive - UNSW/Documents/Projects/phylo_glmm_sim/docs")

options(digits = 4, scipen = 5)

This webpage provides supporting information for the manuscript “Phylogenetic generalised linear mixed-effects modelling with glmmTMB R package”. The webpage is organised into three sections: (1) First, we describe the simulated models and data-generating mechanisms for each of the five evaluated R packages for fitting PGLMMs (2) We then use the simulated data to illustrate compilation and sampling runtimes for the Bayesian MCMC models. (3) Finally, we presents two case studies illustrating the application of these methods to ecological and evolutionary data, the first on bird color traits and the second on plant traits.

For comments or questions, please contact the corresponding author at coralie.williams@unsw.edu.au or coraliewilliams@outlook.com.

2 Models

We demonstrate and explain how to fit phylogenetic generalized mixed models (PGLMM) using five different packages which we used in our simulation study: glmmTMB, phyr, MCMCglmm, brms, and INLA. The models are fitted to one simulated dataset using a randomly generated phylogenetic tree and a set of model parameters. We compare the performance of these packages in terms of runtime, accuracy, and bias of the fixed effect mean and random effect variance estimates.

We assume a set of trait measures \(y_{ij}\) repeated measure (observation) \(i\) and for each species \(j\). The model is specified as follows:

\[ y_{ij} = \beta_0 + \beta_1 x_{ij} + n_j + p_j + \varepsilon_{ij} \]

Where:

  • \(y_{ij}\): trait response for measure (observation) \(i\) in species \(j\)
  • \(\beta_0, \beta_1\): fixed effects (intercept and slope)
  • \(n_i \sim \mathcal{N}(0, \sigma^2_{\text{n}})\): species-level effect (non-phylogenetic)
  • \(p_i \sim \mathcal{N}(0, \sigma^2_{\text{p}} \mathbf{A})\): species-level effect (phylogenetic), where \(\mathbf{A}\) is the phylogenetic correlation matrix
  • \(\varepsilon_{ij} \sim \mathcal{N}(0, \sigma^2_{e})\): residual error

Simulate data

First we specify the parameter values for one simulation run. We assume the following:

Code
### set up model parameters 
seed <- 1         
b0 <- 1           # fixed effect intercept
b1 <- 1.5         # fixed effect slope
k.species <- 30   # number of species
n.reps <- 10      # number of repeated measures per species (assuming a balanced design)
sigma2.n <- 0.25  # variance of non-phylogenetic effect
sigma2.p <- 0.25  # variance of phylogenetic effect
sigma2.e <- 0.2   # residual error variance

Simulate data based on these parameter values:

Code
set.seed(seed)

# set up k.obs (number of observations per species)
k.obs <- n.reps # assumes a balanced design
# species id 
sp.id <- rep(seq_len(k.species), times=k.obs)
# total number of observations
n <- length(sp.id)

# simulate simple dataframe with covariate (x) variable
x <- runif(n, 10, 20) 
dat <- data.frame(obs = 1:n, x = x, species = sp.id)

# simulate tree and obtain phylo matrix
tree <- rtree(k.species, tip.label = seq_len(k.species))
tree <- compute.brlen(tree, power=1) # power of 1 means ultra-metric tree
phylo.mat <- vcv(tree, corr = TRUE) ## we want a correlation matrix (bounded by -1 and 1)
phylo.mat <- phylo.mat[order(as.numeric(rownames(phylo.mat))), order(as.numeric(rownames(phylo.mat)))]


# Simulate response variable (phen) based on cofactor and phylogenetic matrix
u.s <- rnorm(k.species, 0, sqrt(sigma2.n))[sp.id]
u.p <- mvrnorm(1, mu=rep(0, k.species), Sigma=sigma2.p*phylo.mat)[sp.id]
ei <- rnorm(n, 0, sqrt(sigma2.e))

# get estimates of y
yi <- b0 + b1*x + u.s + u.p + ei

### append all to dataframe
dat <- cbind(dat, u.s, u.p, ei, b0, b1, yi)
dat$phylo <- dat$species # phylo ID variable (same as species) - needs to be numeric to work with INLA
dat$species <- factor(dat$species) # format species variable for models
dat$sp <- dat$species # create sp variable (for phyr)
dat$g <- 1 # add variable g constant (for glmmTMB)

View first five rows of the simulated dataset:

Code
head(dat)
  obs     x species     u.s      u.p       ei b0  b1    yi phylo sp g
1   1 12.66       1  0.2103  0.18051  0.17637  1 1.5 20.55     1  1 1
2   2 13.72       2 -0.2001  0.62222 -0.38096  1 1.5 21.62     2  2 1
3   3 15.73       3 -0.6851 -0.48357  1.18474  1 1.5 24.61     3  3 1
4   4 19.08       4  0.4939  0.07762  0.06977  1 1.5 30.26     4  4 1
5   5 12.02       5  0.7599 -0.24698  0.50544  1 1.5 20.04     5  5 1
6   6 18.98       6 -0.1544  0.08753 -1.02373  1 1.5 28.39     6  6 1

Run model with glmmTMB

To fit a PGLMM using glmmTMB package we specify the species-level random effect with phylogenetic relatedness using the propto covariance structure. In the first part we specify the random intercept (0 + species), followed by the grouping variable g (which we assume here as constant), and then the phylogenetic correlation matrix phylo.mat.

The model output will provide the fixed effect estimates, random effect variance estimates, and the residual variance estimate. The random effect variance estimates will be on the standard deviation scale, so we square them to get the variance estimates.

Code
# repeated measures per species
time.glmmTMB <- system.time({
  model_glmmTMB <- glmmTMB(yi ~ x + (1|species) + propto(0 + species|g, phylo.mat),
                           data = dat,
                           REML = TRUE)
})

Check if the model converged (i.e. returns TRUE if the Hessian is positive definite, for more details read here: https://cran.r-project.org/web/packages/glmmTMB/vignettes/troubleshooting.html):

Code
model_glmmTMB$sdr$pdHess
[1] TRUE

Run model with phyr

To fit the model using phyr package we use the cov_ranef argument to specify the phylogenetic tree directly. The random effect term (1|sp__) indicates that we want to include both phylogenetic and non-phylogenetic random effects. The underscores __ indicate that we want to include phylogenetic correlations in the model.

Run model with MCMCglmm

To fit the MCMCglmm model we first we need to set up a precision matrix for the phylogenetic random effect which is required by MCMCglmm. We use the inverseA function from the ape package to obtain the inverse of the phylogenetic covariance matrix. The prior list specifies the priors for the random effects and residual variance. The ginverse argument is used to specify the inverse of the phylogenetic covariance matrix. Then we specify the number of iterations, burn-in, and thinning parameters for the MCMC sampling. Here we increased the number of iterations by 30 times the default value to ensure convergence and stability of the MCMC chains, but this could be further increased. Usually you would want to set these to some reasonably large values.

Next we check the effective sample size (ESS) and the Heidelberger diagnostic test which checks if the MCMC chains have converged and are stationary. The ESS should be above 400 for both fixed and random effects (Vehtari et al., 2021), and the Heidelberger diagnostic should return TRUE for all parameters.

Code
# check ESS
min(effective_sample(model_mcmc, effects = "fixed")$ESS)
[1] 30000
Code
min(effective_sample(model_mcmc, effects = "random")$ESS)
[1] 3239
Code
# check Heidelberger diagnostic
fullchain <- cbind(as.mcmc(model_mcmc$Sol), as.mcmc(model_mcmc$VCV))
heidel.diag(fullchain)
                                          
            Stationarity start     p-value
            test         iteration        
(Intercept) passed       1         0.184  
x           passed       1         0.580  
species     passed       1         0.767  
phylo       passed       1         0.855  
units       passed       1         0.490  
                                     
            Halfwidth Mean  Halfwidth
            test                     
(Intercept) passed    0.845 0.005440 
x           passed    1.510 0.000114 
species     passed    0.149 0.002052 
phylo       passed    0.660 0.017493 
units       passed    0.208 0.000203 

Run model with brms

To fit the model using the brms package we specify the random effects using the (1|species) term for non-phylogenetic random effects and gr(phylo, cov = phylo.mat) for phylogenetic random effects. The data2 argument is used to pass the phylogenetic correlation matrix to the model (the same matrix format as for glmmTMB). We set the number of iterations, chains, and cores to reasonable values to ensure convergence and stability of the MCMC chains. Here we increased the number of iterations by 10 times the default value to ensure convergence and stability of the MCMC chains, but this could be further increased (and should be if ESS are low and Rhat values are higher than 1.01).

Check that the maximum effective sample size (ESS) of the model is high enough and check the Rhat value is below 1.01 (Vehtari et al. 2021):

Code
# check ESS 
min(bayestestR::effective_sample(model_brms, effects = "fixed")$ESS)
[1] 32115
Code
min(bayestestR::effective_sample(model_brms, effects = "random")$ESS)
[1] 9183
Code
# check RHat value is below 1.01 (Vehtari et al. 2021)
max(rhat(model_brms))<1.01
[1] TRUE
Code
# alternatively, we can use the bayestestR package to check the diagnostics of fixed effects
print(bayestestR::diagnostic_posterior(model_brms), digits = 4)
    Parameter   Rhat   ESS       MCSE
1 b_Intercept 1.0000 32115 0.00260528
2         b_x 0.9999 85222 0.00003449

Run model with INLA

For the INLA package we set up the model using the f() function to specify the random effects. The model = "iid" argument indicates that we want to use an independent and identically distributed (iid) random effect for species, while the model = "generic0" argument is used for the phylogenetic random effect. The Cmatrix argument is used to specify the phylogenetic correlation matrix. We also set up a penalizing complexity prior for the precision of the phylogenetic random effect (suggested by xxxx don’t have a ref but can mention it?).

Code
# set up recommended penalizing complexity priors 
pcprior = list(prec = list(prior="pc.prec", param = c(20, 0.1)))

time.inla <- system.time({
  model_inla <- inla(yi ~ x + f(species, model = "iid") + 
                       f(phylo, ## this needs to be a numeric to work
                         model = "generic0",
                         Cmatrix = phylo.prec.mat,
                         hyper=pcprior),
                     family = "gaussian",
                     data = dat)
})

fit_inla <- summary(model_inla)

Extract model estimates

First we extract the covariate \(x\) estimate and confidence interval for each model:

Code
## Format run time for each model ---------
time.phyr <- as.numeric(time.phyr[3])
time.glmmTMB <- as.numeric(time.glmmTMB[3])
time.brms <- as.numeric(time.brms[3])
time.mcmc <- as.numeric(time.mcmc[3])
time.inla <- as.numeric(time.inla[3])


## Fixed effect estimates -------

# phyr
coefs_phyr <- as.data.frame(fixef(model_phyr))
coefs_phyr$conf.low[2] <- coefs_phyr$Value[2] - coefs_phyr$Std.Error[2]*1.96
coefs_phyr$conf.high[2] <- coefs_phyr$Value[2] + coefs_phyr$Std.Error[2]*1.96

# glmmTMB
coefs_tmb <- as.data.frame(confint(model_glmmTMB, parm="beta_"))

# brms
coefs_brm <- as.data.frame(tidy(model_brms, effects="fixed", conf.int=TRUE))

# MCMCglmm 
coefs_mcmc <- as.data.frame(tidy(model_mcmc, effects="fixed", conf.int=TRUE))

# INLA
coefs_inla <- as.data.frame(fit_inla$fixed)


## Combine fixed effect results -----------------
res_fixed <- data.frame(
  model = c("phyr", "glmmTMB", "brms", "MCMCglmm", "INLA"),
  species_size = k.species,
  sample_size = n,
  run_time = c(time.phyr, time.glmmTMB, time.brms, time.mcmc, time.inla),
  b0 = rep(dat$b0[1], 5),
  b1 = rep(dat$b1[1], 5),
  mu = c(coefs_phyr$Value[2],
         coefs_tmb$Estimate[2], 
         coefs_brm$estimate[2], 
         coefs_mcmc$estimate[2], 
         coefs_inla$mean[2]),
  mu_ci_low = c(coefs_phyr$conf.low[2],
                coefs_tmb$`2.5 %`[2],
                coefs_brm$conf.low[2], 
                coefs_mcmc$conf.low[2], 
                coefs_inla$`0.025quant`[2]),
  mu_ci_high = c(coefs_phyr$conf.high[2],
                 coefs_tmb$`97.5 %`[2],
                 coefs_brm$conf.high[2], 
                 coefs_mcmc$conf.high[2], 
                 coefs_inla$`0.975quant`[2]),
  stringsAsFactors = FALSE
)
    

## Show table of results of runtime and fixed effects ------
kable(res_fixed, 
  caption = "Runtime and fixed effect estimates of the simulated model",
  col.names = c("Model", "Species size", "Sample size", "Run time (s)", "b0", "b1", "Estimate (b1)", "CI low (b1)", "CI high (b1)"),
  digits = 3,
  format = "html"
)
Runtime and fixed effect estimates of the simulated model
Model Species size Sample size Run time (s) b0 b1 Estimate (b1) CI low (b1) CI high (b1)
phyr 30 300 0.55 1 1.5 1.51 1.491 1.530
glmmTMB 30 300 1.00 1 1.5 1.51 1.491 1.530
brms 30 300 489.67 1 1.5 1.51 1.490 1.530
MCMCglmm 30 300 134.45 1 1.5 1.51 1.490 1.530
INLA 30 300 5.64 1 1.5 1.51 1.490 1.529

Extract the variance component estimates for each model and compare:

Code
#### Random effect estimates -------
  
# get phyr random effect variance estimates 
var_re_phyr <- c(as.numeric(model_phyr$ss[2])^2, #phylogenetic 
                 as.numeric(model_phyr$ss[1])^2, #non-phylogenetic 
                 as.numeric(model_phyr$ss[3])^2) #residual 
# combine into dataframe
sigma2_phyr <- data.frame(
  model = "phyr",
  group = c("phylo", "species", "Residual"),
  term = "var",
  estimate = var_re_phyr,
  std.error = NA,
  conf.low = NA, 
  conf.high = NA 
)


# get glmmTMB random effect variance estimates (by default it is on the standard deviation scale)
re_tmb <- as.data.frame(confint(model_glmmTMB, parm="theta_"))
species_tmb <- re_tmb[1, ]
phylo_tmb <- re_tmb[2, ]
# combine into dataframe
sigma2_tmb <- data.frame(
  model = "glmmTMB",
  group = c("phylo", "species", "Residual"),
  term = "var",
  estimate = c(phylo_tmb$Estimate^2,   #phylo variance estimates
               species_tmb$Estimate^2, #non-phylo variance estimates
               sigma(model_glmmTMB)^2),
  std.error = NA, #
  conf.low = c(phylo_tmb$`2.5 %`, species_tmb$`2.5 %`, NA), # Replace with the residual var CI if available
  conf.high = c(phylo_tmb$`97.5 %`, species_tmb$`97.5 %`, NA) # Replace with the residual var CI if available
)


# Compute variance, SE (delta method), and CI on variance scale)
var_est <- re_tmb$Estimate^2
var_se <- 2 * re_tmb$Estimate * (re_tmb$`97.5 %` - re_tmb$`2.5 %`) / (2 * 1.96)
var_ci_low <- re_tmb$`2.5 %`^2
var_ci_high <- re_tmb$`97.5 %`^2

# Residual variance
resid_var <- sigma(model_glmmTMB)^2

# Combine results
sigma2_tmb <- data.frame(
  model = "glmmTMB",
  group = c("phylo", "species", "Residual"),
  term = "var",
  estimate = c(var_est, resid_var),
  std.error = c(var_se, NA),
  conf.low = c(var_ci_low, NA),
  conf.high = c(var_ci_high, NA))



# get brms random effect variance estimates (standard deviation scale)
sigma_brms <- tidy(model_brms, effects="ran_pars")
sigma2_brms <- sigma_brms %>%
  mutate(model="brms",
         term=str_replace(term, "sd", "var"),
         estimate=estimate^2) %>%     ##compute variance estimates
  dplyr::select(model, group, term, estimate, std.error, conf.low, conf.high)


# get MCMCglmm random effect estimates (variance scale)
sigma2_mcmc <- tidy(model_mcmc, effects="ran_pars", conf.int=TRUE)
sigma2_mcmc <- sigma2_mcmc %>%
  mutate(model="MCMCglmm",
         group=str_replace(group,"animal", "phylo")) %>% 
  dplyr::select(model, group, term, estimate, std.error, conf.low, conf.high)


# get INLA random effect estimates (precision scale i.e. inverse variance)
re_inla <- 1/fit_inla$hyperpar
sigma2_inla <- data.frame(
  model = "INLA",
  group = c( "Residual", "species", "phylo"),
  term = "var",
  estimate = re_inla$mean,
  std.error = fit_inla$hyperpar$sd,
  conf.low = re_inla$`0.025quant`,
  conf.high = re_inla$`0.975quant`
)


# merge fixed results together
s2 <- as.data.frame(rbind(sigma2_phyr,
                          sigma2_tmb,
                          sigma2_brms,
                          sigma2_mcmc, 
                          sigma2_inla))

# get subsets for each group
s2_phylo <- s2 %>% filter(group=="phylo")
s2_sp <- s2 %>% filter(group=="species")
s2_res <- s2 %>% filter(group=="Residual")
    



## Combine random effect var estimates results -----------------
res_rand <- data.frame(
  model = c("phyr", "glmmTMB", "brms", "MCMCglmm", "INLA"),
  species_size = k.species,
  sample_size = n,
  run_time = c(time.phyr, time.glmmTMB, time.brms, time.mcmc, time.inla),
  sigma2_phylo = s2_phylo$estimate,
  sigma2_species = s2_sp$estimate,
  sigma2_residual = s2_res$estimate,
  stringsAsFactors = FALSE
)
    

## Show table of results of runtime and random variance ------
kable(res_rand, 
  caption = "Runtime and random component variance estimates of the simulated model",
  col.names = c("Model", "Species size", "Sample size", "Run time (s)", "Phylo variance est.", "Non-phylo variance est.", "Residual variance est."),
  digits = 3,
  format = "html"
)
Runtime and random component variance estimates of the simulated model
Model Species size Sample size Run time (s) Phylo variance est. Non-phylo variance est. Residual variance est.
phyr 30 300 0.55 0.346 0.639 0.206
glmmTMB 30 300 1.00 0.132 0.427 0.206
brms 30 300 489.67 0.568 0.140 0.208
MCMCglmm 30 300 134.45 0.660 0.149 0.208
INLA 30 300 5.64 0.549 0.077 0.205

3 Bayesian MCMC model runtime

Here we provide a breakdown of the compilation and sampling runtimes for the Bayesian MCMC models using the above simulated repeated measures dataset. The runtimes are provided for each model fitted using the MCMCglmm and brms packages. Note: the compilation time is the time taken to compile the model, while the sampling time is the time taken to sample from the posterior distribution of the model parameters.

First we set up the MCMCglmm model.

MCMCglmm

Set up MCMCglmm tuning parameters - usually set to some reasonably large values through trial and error.

Code
NITT <- 100000
BURNIN <- floor(0 * NITT)
THIN <- floor((NITT - BURNIN) / 1500)

Run a model that prepares the run but does not start actual sampling to have the baseline timing of the model pre-run protocols.

Code
runtime_1_pre <- benchmark(
    "process" = {
        model_mcmcglmm_pre <- MCMCglmm(yi ~ x,
                        random = ~species + phylo,
                        family = "gaussian",
                        ginverse = list(phylo = phylo.prec.mat),
                        prior = prior,
                        data = dat,
                        verbose = FALSE,
                        nitt = 1, burnin = 0, thin = 1
                        )
        }, replications = 1
)

runtime_1_pre$elapsed
[1] 0.06

Run full reasonable model (like above). Extract MCMC trace.

Code
THIN <- 1
model_mcmcglmm_max <- MCMCglmm(yi ~ x,
                        random = ~species + phylo,
                        family = "gaussian",
                        ginverse = list(phylo = phylo.prec.mat),
                        prior = prior,
                        data = dat,
                        verbose = FALSE,
                        nitt = NITT, burnin = BURNIN, thin = THIN
)
summary(model_mcmcglmm_max)

 Iterations = 1:100000
 Thinning interval  = 1
 Sample size  = 100000 

 DIC: 408.3 

 G-structure:  ~species

        post.mean      l-95% CI u-95% CI eff.samp
species      0.15 0.00000000102      0.3     3810

               ~phylo

      post.mean l-95% CI u-95% CI eff.samp
phylo     0.649  0.00653     1.63     1676

 R-structure:  ~units

      post.mean l-95% CI u-95% CI eff.samp
units     0.208    0.173    0.244    80737

 Location effects: yi ~ x 

            post.mean l-95% CI u-95% CI eff.samp    pMCMC    
(Intercept)    0.8410  -0.0909   1.8283   101752    0.076 .  
x              1.5102   1.4900   1.5294   100000 <0.00001 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
par_mcmcglmm_max <- model_mcmcglmm_max$VCV[, "species"]
plot(par_mcmcglmm_max)

Code
# geweke.plot(par_mcmcglmm_max)
# geweke.diag(par_mcmcglmm_max)
# heidel.diag(par_mcmcglmm_max)
test <- raftery.diag(par_mcmcglmm_max)

# update your run parameters
NITT <- test$resmatrix[,"N"] + test$resmatrix[,"M"]
THIN <- ceiling(test$resmatrix[,"N"] / test$resmatrix[,"Nmin"])
BURNIN <- test$resmatrix[,"M"]

runtime_1_run <- benchmark(
    "process" = {
        model_mcmcglmm_1_run <- MCMCglmm(yi ~ x,
                        random = ~species + phylo,
                        family = "gaussian",
                        ginverse = list(phylo = phylo.prec.mat),
                        prior = prior,
                        data = dat,
                        verbose = FALSE,
                        nitt = NITT, burnin = BURNIN, thin = THIN
                        )
        }, replications = 1
)

runtime_1_run$elapsed - runtime_1_pre$elapsed
[1] 10.16

BRMS

“Reasonable” brms parameters.

Code
NITTb <- 10000
BURNINb <- 1000
THINb <- 1
CHAINS <- 1

Run a model that prepares the run but does not start actual sampling to have the baseline timing of the model pre-run protocols.

Code
runtime_2_pre <- benchmark(
    "process" = {
        model_brms_1_pre <- brm(yi ~ x + (1|species) + (1|gr(phylo, cov = phylo.mat)),
            data = dat,
            data2 = list(phylo.mat = phylo.mat),
            chains = CHAINS,
            iter = 2,
            warmup = 1,
            thin = 1
        )
    }, replications = 1
)

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 0.000268 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 2.68 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: WARNING: No variance estimation is
Chain 1:          performed for num_warmup < 20
Chain 1: 
Chain 1: Iteration: 1 / 2 [ 50%]  (Warmup)
Chain 1: Iteration: 2 / 2 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0 seconds (Warm-up)
Chain 1:                0.006 seconds (Sampling)
Chain 1:                0.006 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 0.000152 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.52 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: WARNING: No variance estimation is
Chain 1:          performed for num_warmup < 20
Chain 1: 
Chain 1: Iteration: 1 / 2 [ 50%]  (Warmup)
Chain 1: Iteration: 2 / 2 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0 seconds (Warm-up)
Chain 1:                0.005 seconds (Sampling)
Chain 1:                0.005 seconds (Total)
Chain 1: 
Code
runtime_2_pre$elapsed
[1] 95.01

Run “larger” model and convert posterior to MCMC object.

Code
model_brms_max <- brm(yi ~ x + (1|species) + (1|gr(phylo, cov = phylo.mat)),
            data = dat,
            data2 = list(phylo.mat = phylo.mat),
            chains = CHAINS,
            iter = NITTb,
            warmup = BURNINb,
            thin = THINb
)

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 0.00037 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 3.7 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 1: Iteration: 1001 / 10000 [ 10%]  (Sampling)
Chain 1: Iteration: 2000 / 10000 [ 20%]  (Sampling)
Chain 1: Iteration: 3000 / 10000 [ 30%]  (Sampling)
Chain 1: Iteration: 4000 / 10000 [ 40%]  (Sampling)
Chain 1: Iteration: 5000 / 10000 [ 50%]  (Sampling)
Chain 1: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 1: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 1: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 1: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 11.219 seconds (Warm-up)
Chain 1:                80.201 seconds (Sampling)
Chain 1:                91.42 seconds (Total)
Chain 1: 
Code
summary(model_brms_max)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: yi ~ x + (1 | species) + (1 | gr(phylo, cov = phylo.mat)) 
   Data: dat (Number of observations: 300) 
  Draws: 1 chains, each with iter = 10000; warmup = 1000; thin = 1;
         total post-warmup draws = 9000

Multilevel Hyperparameters:
~phylo (Number of levels: 30) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.77      0.28     0.32     1.43 1.00     1209     1815

~species (Number of levels: 30) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.37      0.10     0.16     0.58 1.00     1410     1302

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.84      0.48    -0.08     1.82 1.00     6058     4676
x             1.51      0.01     1.49     1.53 1.00    22304     6394

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.46      0.02     0.42     0.50 1.00    14738     5317

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Code
par_brms_max <- as.mcmc(model_brms_max, pars = "sd_species__Intercept")[[1]]
plot(par_brms_max)

Code
test <- raftery.diag(par_brms_max)

# update your run parameters
NITTb <- test$resmatrix[,"N"] + test$resmatrix[,"M"]
THINb <- ceiling(test$resmatrix[,"N"] / test$resmatrix[,"Nmin"])
BURNINb <- test$resmatrix[,"M"]

runtime_2_run <- benchmark(
    "process" = {
        model_brms_1_run <- brm(yi ~ x + (1|species) + (1|gr(phylo, cov = phylo.mat)),
            data = dat,
            data2 = list(phylo.mat = phylo.mat),
            chains = CHAINS,
            iter = NITTb,
            warmup = BURNINb,
            thin = THINb
        )
    }, replications = 1
)

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 0.000219 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 2.19 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: WARNING: There aren't enough warmup iterations to fit the
Chain 1:          three stages of adaptation as currently configured.
Chain 1:          Reducing each adaptation stage to 15%/75%/10% of
Chain 1:          the given number of warmup iterations:
Chain 1:            init_buffer = 3
Chain 1:            adapt_window = 19
Chain 1:            term_buffer = 2
Chain 1: 
Chain 1: Iteration:     1 / 26925 [  0%]  (Warmup)
Chain 1: Iteration:    25 / 26925 [  0%]  (Sampling)
Chain 1: Iteration:  2716 / 26925 [ 10%]  (Sampling)
Chain 1: Iteration:  5408 / 26925 [ 20%]  (Sampling)
Chain 1: Iteration:  8100 / 26925 [ 30%]  (Sampling)
Chain 1: Iteration: 10792 / 26925 [ 40%]  (Sampling)
Chain 1: Iteration: 13484 / 26925 [ 50%]  (Sampling)
Chain 1: Iteration: 16176 / 26925 [ 60%]  (Sampling)
Chain 1: Iteration: 18868 / 26925 [ 70%]  (Sampling)
Chain 1: Iteration: 21560 / 26925 [ 80%]  (Sampling)
Chain 1: Iteration: 24252 / 26925 [ 90%]  (Sampling)
Chain 1: Iteration: 26925 / 26925 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.036 seconds (Warm-up)
Chain 1:                3.657 seconds (Sampling)
Chain 1:                3.693 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 0.000302 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 3.02 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: WARNING: There aren't enough warmup iterations to fit the
Chain 1:          three stages of adaptation as currently configured.
Chain 1:          Reducing each adaptation stage to 15%/75%/10% of
Chain 1:          the given number of warmup iterations:
Chain 1:            init_buffer = 3
Chain 1:            adapt_window = 19
Chain 1:            term_buffer = 2
Chain 1: 
Chain 1: Iteration:     1 / 26925 [  0%]  (Warmup)
Chain 1: Iteration:    25 / 26925 [  0%]  (Sampling)
Chain 1: Iteration:  2716 / 26925 [ 10%]  (Sampling)
Chain 1: Iteration:  5408 / 26925 [ 20%]  (Sampling)
Chain 1: Iteration:  8100 / 26925 [ 30%]  (Sampling)
Chain 1: Iteration: 10792 / 26925 [ 40%]  (Sampling)
Chain 1: Iteration: 13484 / 26925 [ 50%]  (Sampling)
Chain 1: Iteration: 16176 / 26925 [ 60%]  (Sampling)
Chain 1: Iteration: 18868 / 26925 [ 70%]  (Sampling)
Chain 1: Iteration: 21560 / 26925 [ 80%]  (Sampling)
Chain 1: Iteration: 24252 / 26925 [ 90%]  (Sampling)
Chain 1: Iteration: 26925 / 26925 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.225 seconds (Warm-up)
Chain 1:                631.601 seconds (Sampling)
Chain 1:                631.826 seconds (Total)
Chain 1: 
Code
runtime_2_run$elapsed - runtime_2_pre$elapsed
[1] 624.9

4 Case studies

4.1 Background

The following case studies illustrate how phylogenetic mixed models can be applied to diverse questions in ecology and evolution. The first examines patterns of plumage colour in birds, while the second explores macroevolutionary patterns in plant traits. Together, they demonstrate the flexibility of these methods across taxa and research questions.

All models, results, and datasets presented in these case studies are intended solely for illustrative purposes. They are designed to demonstrate analytical approaches rather than to provide definitive biological insights. No substantive conclusions should be drawn from these analyses.

4.2 Case study 1: Bird color evolution

Birds are a valuable system for studying ecological and evolutionary processes because their diversity in form, behaviour, and coloration is well documented across many species and habitats. Variation in plumage colour, in particular, can provide insights into mechanisms such as sexual selection, camouflage, and signalling. By examining these traits, researchers can better understand how environmental and evolutionary pressures shape biodiversity.

Data overview

Load data bird spectrum data:

Code
# Load bird spectrum data 
bird_data <- read_delim("data/Spec_IndivReg_Coralie.csv", 
    delim = ";", escape_double = FALSE, locale = locale(decimal_mark = ","), 
    trim_ws = TRUE)

# summary of birds species and genus
length(table(bird_data$genus_original)) # 446 genus
[1] 446
Code
length(table(bird_data$sci_name_Jetz)) # 949 species
[1] 949
Code
#length(table(bird_data$sci_name_original)) # 952 original species

# Remove the specified species from bird_data
bird_data <- bird_data |> 
  filter(!sci_name_Jetz %in% c("Basileuterus_rufifrons", "Malurus_lamberti", "Malurus_splendens"))


########## Notes #########

# missing values ---> not sure why?
#na_count <- colSums(is.na(dat2))
#na_count[na_count > 0]

# individuals a-c are male and d-f are female
#table(bird_data$sex, bird_data$individual)

# number of measurements?
#table(bird_data$Nmeasured)

# duplicates per individual per body region
#dup <- dat |>
#    summarise(n = n(), 
#              .by = c(wl, individual_nonrep, sex, body_region))
#r <- dup[which(dup$n>1),]
#table(r$individual_nonrep)

Some notes:

  • The dataset contains data for 949 bird species, with 446 unique genera.

  • Spectral values are normalised reflectance data.

  • Nmeasured is the number of measurements per body patch.

  • Each body region measured 5 times (then averaged).

  • These species seem to have two measurements per body region per individual, which we remove to facilitate the analysis.:

    Basileuterus_rufifrons, Malurus_lamberti, Malurus_splendens

Color data set-up

Obtain wavelength dataset

Now we want to create a wavelength dataset using the pavo package given the spectral reflectance data.

Code
# set up wavelength dataset (wavelengths columns 300 to 474)
dat <- bird_data |> 
  pivot_longer(cols = `300`:`700`, names_to="wl", values_to="refl") |> 
  dplyr::select(individual_nonrep, body_region, sex, wl, refl) |> 
  mutate(wl = as.numeric(wl))

# pivot so each column is an individual body region measurement
dat2 <- dat |> 
  pivot_wider(names_from = c("individual_nonrep", "sex", "body_region"),
              values_from = "refl", names_sep = ".")

# create spectral dataset with pavo
specs <- as.rspec(dat2)

# some examples of individual birds body region color spectrum
plot(Acrocephalus_palustris_a.Male.throat ~ wl, type="l", data=specs, title="Acrocephalus palustris (throat)")

Code
plot(Alle_alle_a.Male.throat ~ wl, type="l", data=specs, title="Alle alle (throat)")

Code
plot(Aplonis_metallica_a.Male.wing_cov ~ wl, type = "l", data = specs, title="Aplonis metallica (wing covert)")

Code
# use procspec to adjust negative values by shifting them by 10
# (this maybe due to darker colors)
specs <- procspec(specs, fixneg="addmin")
plot(specs, select = 10)

Obtain spectral shape descriptors

Using the wavelength dataset we can now obtain spectral shape descriptors. These descriptors will be used in the model. Here we will focus on four descriptors of interest: brightness (B1), spectral slope (S1), spectral curvature (S9), and hue (H4). More information about the spectral shape descriptors can be found here: https://book.colrverse.com/spectral-shape-descriptors.html

Code
# Obtain spectral shape descriptors
spec.des <- summary(specs, subset = c("B1", "B2", "S9", "H4")) ## using subset makes it faster to run by selecting the shapes of interest

dev.off()
null device 
          1 
Code
# distribution of B1
ggplot(spec.des, aes(x = B1)) +
  geom_histogram(bins = 30, colour = "black", fill = "grey") +
  labs(x = "B1", title = "B1: Total brightness") +
  theme_bw()

# distribution of S9
ggplot(spec.des, aes(x = S9)) +
  geom_histogram(bins = 30, colour = "black", fill = "grey") +
  labs(x = "S9", title = "S9: Carotenoid chroma") +
  theme_bw()

# vdistribution o fH4
ggplot(spec.des, aes(x = H4)) +
  geom_histogram(bins = 30, colour = "black", fill = "grey") +
  labs(x = "H4", title = "H4: Hue (segment classification)") +
  theme_bw()

The first dataset is the shape descriptors dataset, which we will use to model the continuous trait (brightness).

Code
spec.des$rowname <- rownames(spec.des)

spec.dat <- spec.des |> 
  mutate(sex = case_when(
    grepl("Male", rowname) ~ "male",
    grepl("Female", rowname) ~ "female"),
    species = gsub("\\_[a-f]\\..*$", "", rowname),
    body_region = str_extract(rowname, "[a-z]+$")
  ) 

write.csv(spec.dat, file="data/spec_data_for_model.csv")

Obtain carotenoid datasets: binary and ordinal traits

We will use the carotenoid dataset to model the presence/absence of carotenoid coloration in birds. This dataset is based on the spectral reflectance data and the spectral shape descriptors.

Let’s set up the dataset for modelling a binary trait (absence/presence of carotenoid color):

Code
# 1) get human visual model (CIE 10 degree observed under D65 "day light")
vm <- vismodel(specs, visual = "cie10", illum = "D65", bkg = "ideal", relative = FALSE)

# 2) convert to CIELAB colorspace (to get hue and chroma)
lab <- colspace(vm, space = "cielab") 

# 3) get chroma and hue angle (converting CIELAB to CIELCh)
L <- lab$L
a <- lab$a
b <- lab$b
C <- sqrt(a^2 + b^2)
h <- (atan2(b, a) * 180 / pi) %% 360  # degrees in 0 to 360

# 4) thresholds for carotenoid range and saturation
h_lower <- 330   # start of red
h_upper <- 100   # end of yellow
C_min   <- 15 # min chroma
L_min   <- 20 # avoid very dark samples

# 5) Hue range test 
in_range <- h >= h_lower | h <= h_upper

# 6) COmpute binary trait:  1 = carotenoid like colour present, 0 = absent
present <- as.integer(in_range & C >= C_min & L >= L_min)

# set up dataset
carot.dat.all <- data.frame(
  rowname  = rownames(lab),
  L = L, a = a, b = b, C = C, h = h,
  carotenoid = present
)

# merge with spec.data based on "rowname" column
carot.dat.all <- merge(spec.dat, carot.dat.all, by.x = "rowname", by.y = "rowname", all = TRUE)
# create individual id variable
carot.dat.all$indiv_rep <- sub(".*_([a-f])\\..*$", "\\1", carot.dat.all$rowname)

# number of observations with carotenoid like colour
table(carot.dat.all$carotenoid)

    0     1 
33520   506 
Code
# # checks
# carot.dat.all |>
#   filter(L>20 & C>12 & in_range) |>
#   arrange(desc(L))

# #  check bird cardinalis cardinalis (should be yellow or red?)
# carot.dat.all |> 
#   filter(grepl("Cardinalis_cardinalis", rowname)) |> 
#   arrange(desc(h))


# Plot
ggplot(carot.dat.all, aes(h, C)) +
  geom_point(data = subset(carot.dat.all, carotenoid == 0),
             shape = 21, size = 1.5, fill = "grey85", colour = "grey70", alpha = 0.6) +
  geom_point(data = subset(carot.dat.all, carotenoid == 1),
             aes(fill = L, colour = L), shape = 21, size = 1.8, stroke = 0.4, alpha = 0.95) +
  scale_fill_gradient(name = "L*", limits = c(15, 50), low = "darkred", high = "lightcoral") +
  scale_colour_gradient(guide = "none", limits = c(15, 50), low = "darkred", high = "lightcoral") +
  labs(x = "Hue (degrees)", y = "Chroma (C*)", title = "Carotenoid presence") +
  theme_bw()

Let’s save the dataset for modelling later on:

Code
carot.dat <- carot.dat.all

# save as csv
write.csv(carot.dat, file="data/carotenoid_data_for_model.csv", row.names = FALSE)

Now let’s obtain the dataset where we summarise for each individual bird the proportion of body region with carotenoid color presence (ordinal data trait), and save it for modelling later on:

Code
carot.ordinal <- carot.dat.all |>
  group_by(species, sex, indiv_rep) |>
  summarise(prop_carotenoid = mean(carotenoid), .groups = "drop")

# save as csv
write.csv(carot.ordinal, file="data/ordinal_data_for_model.csv", row.names = FALSE)

Phylogenetic correlation matrix set-up

Load and view phylogenetic tree:

Code
# Load bird tree (consensus tree = "combined tree")
bird.tree <- read.tree("data/Stage2_Hackett_MCC_no_neg.tre")

### Prune bird tree
bird.pruned <- keep.tip(bird.tree, bird_data$sci_name_Jetz)
# check whether names match in data and tree
check <- name.check(bird.pruned, bird_data$sci_name_Jetz, sort(bird.pruned$tip.label))

# plot tree
plotTree(bird.pruned, ftype="i", fsize=0.4, lwd=1, type="fan")

Code
dev.off()
null device 
          1 

Set up correlation matrix for glmmTMB model and check it corresponds to the species labels in the data:

Code
# set up phylogenetic correlation matrix
phylo.mat <- vcv(bird.pruned, corr = TRUE) 
phylo.mat <- phylo.mat[sort(rownames(phylo.mat)), sort(rownames(phylo.mat))]
saveRDS(phylo.mat, file = "data/phylo_matrix.rds")


# checks   
# length(colnames(phylo.mat))==length(table(spec.dat$species))
# all(head(rownames(phylo.mat))==head(colnames(phylo.mat)))
# head(table(spec.dat$species))

Models

  1. Continuous trait (brightness): we compare models with distributions of gaussian, lognormal, gamma, and skew normal.
  2. Binary trait (absence/presence of color): binomial distribution.
  3. Ordinal trait (100% of body region with color): beta binomial.

1. Continuous trait

Code
# load data
spec.dat <- read_csv("data/spec_data_for_model.csv")
phylo.mat <- readRDS("data/phylo_matrix.rds")

# load library                                 
library(glmmTMB)

# add grouping variable (set it to 1) - this is necessary to fit the glmmTMB model
spec.dat$g <- 1
Modelling total brightness (B1)

The brightness trait is right-skewed (as shown above), which is consistent with multiplicative evolutionary change. To identify an appropriate sampling distribution, we will fit four models with an identical linear predictor and random effects structure:

  1. Gaussian to model \(\log(B1)\)
  2. Gamma with a log link to model \(B1\)

For each model we will examine simulated standardised residuals using to assess nonlinearity and heteroscedasticity. If two or more models perform similarly, we will prefer the model with clearer interpretation on the original scale.

Code
# # Little checks ---------
# library(fitdistrplus)
# descdist(spec.dat$B1, discrete=FALSE)
# gamma_dist <- fitdist(spec.dat$B1, "gamma")
# plot(gamma_dist)
# norm_dist <- fitdist(log(spec.dat$B1), "norm")
# plot(norm_dist)
# lnorm_dist <- fitdist(spec.dat$B1, "lnorm")
# plot(lnorm_dist)


# Fit models --------------

# normal
time_norm <- system.time(
  m1 <- glmmTMB(log(B1) ~ body_region * sex + (1|species) + propto(0 + species|g,phylo.mat),
                      family = gaussian(),
                      data = spec.dat) # don't use REML (to get AIC)
  )

# Gamma distribution
time_gamma <- system.time(
  m2 <- glmmTMB(B1 ~ body_region * sex + (1|species) + propto(0 + species|g,phylo.mat),
                      family = Gamma(link = "log"),
                      data = spec.dat)
)

# # Lognormal ---> not working get an error that the link function is not supported?
# t_lnorm <- system.time(
#   modb1_lnorm <- glmmTMB(B1 ~ body_region * sex + (1|species) + propto(0 + species|g,phylo.mat), family = lognormal(link = "log"), data = spec.dat)
# )


# Get model info  ---------------------------
# check whether model has postive definite hessian

b1_output <- data.frame(
  model = c("Gaussian (log)", "Gamma"),
  convergence = c(m1$sdr$pdHess, m2$sdr$pdHess),
  runtime = c(time_norm[["elapsed"]], time_gamma[["elapsed"]])
)

b1_output
           model convergence runtime
1 Gaussian (log)        TRUE   146.2
2          Gamma        TRUE   219.3
Model diagnostics

Let’s check residual plots with the DHARMa packages. First, the log(B1) assuming normal distribution residual checks:

Code
res_gauss <- DHARMa::simulateResiduals(fittedModel = m1)
plot(res_gauss)

The model checks assuming Gamma distribution and log link function:

Code
res_gamma <- DHARMa::simulateResiduals(fittedModel = m2)
plot(res_gamma)

We found that the Gamma distribution shows improve model fit as the residual plots

Model estimates

Let’s obtain the estimate of the phylogenetic signal in the :

Code
m2_re <- as.data.frame(confint(m2, parm="theta_"))
sigma2_s <- m2_re$Estimate[1] # non-phylogenetic variance
sigma2_p <- m2_re$Estimate[2] # phylogenetic variance estimate

p.signal <- sigma2_p / (sigma2_p + sigma2_s)
p.signal
[1] 0.6977

On the log mean scale of the Gamma model, the phylogenetic species effect explained 69.8% of the total species level random effect variance i.e. this represents the degree of phylogenetic signal in the overall variance sourced from species.

To assess differences in total brightness between sexes for each body region, we compute marginal means from the fitted model and perform pairwise comparisons between females and males. The resulting ratios (female/male) and their confidence intervals are then plotted to visualise the magnitude and direction of differences across body regions.

Code
emm_b1 <- emmeans(m2, ~ sex | body_region, type = "response")
b1_sex_diff <- contrast(emm_b1, method = "pairwise")


## Transform to response scale as a ratio
sex_diff_df <- as.data.frame(b1_sex_diff)
sex_diff_df$lower.CL <- sex_diff_df$ratio - sex_diff_df$SE
sex_diff_df$upper.CL <- sex_diff_df$ratio + sex_diff_df$SE

## plot pairwise differences between female and male total brightness
sex_diff_df |> 
  arrange(ratio) |>
  mutate(body_region = factor(body_region, unique(body_region), ordered = T)) |> 
ggplot(aes(y = body_region, x = ratio, color = body_region)) +
  geom_point(size = 3) +
  geom_errorbar(aes(xmin = lower.CL, xmax = upper.CL), width = 0.2) +
  geom_vline(xintercept = 1, linetype = "dashed", colour = "grey40") +
  labs(y = "Body Region", x = "Female / Male Ratio", 
       title = "Brightness") +
  theme_bw() +
  theme(legend.position = "none")  # Optional: remove legend if redundant

Ratios \(>1\) indicate greater brightness in females, while ratios \(<1\) indicate greater brightness in males.

2. Binary trait

Binary pigment traits, such as the presence of carotenoid colors, may be associated with distinct evolutionary and ecological drivers.

Here we want to model the absence or presence of carotenoid across all body regions accounting for sex and including species- and phylogeny-level random effects. We will use the carotenoid dataset to model based on the spectral reflectance data which we derived above.

Modelling carotenoid presence

We will fit two models: - Binomial model with a logit link function - beta binomial model with a logit link function

Code
# Fit models --------------

# load dat
carot.dat <- read_csv("data/carotenoid_data_for_model.csv")
carot.dat$g <- 1 
 
# binomial model
time_binom <- system.time(
  m3 <- glmmTMB(carotenoid ~ body_region * sex + (1|species) + propto(0 + species|g,phylo.mat),
                family = binomial(link = "logit"),
                data = carot.dat)
)

# beta binomial
time_bbinom <- system.time(
  m4 <- glmmTMB(carotenoid ~ body_region * sex + (1|species) + propto(0 + species|g,phylo.mat),
                family = betabinomial(link = "logit"),
                data = carot.dat)
)

# Get model info -----------

carotmod_output <- data.frame(
  model = c("Binomial", "Beta binomial"),
  convergence = c(m3$sdr$pdHess, m4$sdr$pdHess),
  runtime = c(time_binom[["elapsed"]], time_bbinom[["elapsed"]]),
  AIC = c(AIC(m3), AIC(m4))
)
carotmod_output 
          model convergence runtime  AIC
1      Binomial        TRUE   118.6 3094
2 Beta binomial        TRUE   328.4 3096
Model diagnostics

Let’s check residual plots with the DHARMa packages. First, let’s get the residual plots for the binomial model:

Code
res_binom <- DHARMa::simulateResiduals(fittedModel = m3)
plot(res_binom)

Now the model checks assuming zero inflated binomial distribution and log link function:

Code
res_bbinom <- DHARMa::simulateResiduals(fittedModel = m4)
plot(res_bbinom)

We found that the binomial model has improved model fit in AIC and in the residual plots.

Model estimates

Get model estimates:

Code
library(emmeans)

# Estimated marginal means on the response scale (odds)
emm_m3 <- emmeans(m3, ~ sex | body_region, type = "response")

# get contrasts (odds ratios)
m3_sex_diff <- contrast(emm_m3, method = "pairwise") |> 
  summary(infer = TRUE, type = "response")

# Put into data frame for plotting
sex_diff_df <- as.data.frame(m3_sex_diff) |>
  rename(ratio = odds.ratio, lower.CL = asymp.LCL, upper.CL = asymp.UCL)

# Plot
sex_diff_df |>
  arrange(ratio) |>
  mutate(body_region = factor(body_region, unique(body_region), ordered = TRUE)) |>
  ggplot(aes(y = body_region, x = ratio, colour = body_region)) +
  geom_point(size = 3) +
  geom_errorbar(aes(xmin = lower.CL, xmax = upper.CL), width = 0.2) +
  geom_vline(xintercept = 1, linetype = "dashed", colour = "grey40") +
  labs(y = "Body Region", x = "Female / Male Odds Ratio", 
       title = "Sex differences in carotenoid presence") +
  theme_bw() +
  theme(legend.position = "none")

3. Ordinal trait (TBD)

The ordinal trait is the percentage of body region with carotenoid presence. We will fit a beta-binomial and model to this trait, which is suitable for modeling proportions.

Modelling carotenoid proportion per body region
Code
# load data
ord.dat <- read_csv("data/ordinal_data_for_model.csv")
ord.dat$g <- 1 # add grouping variable 

# fit model 
time_ord <- system.time(
  m5 <- glmmTMB(prop_carotenoid ~ sex + (1|species) + propto(0 + species|g,phylo.mat),
                  family = ordbeta(),
                  data = ord.dat)
)

# output
data.frame(
  model = "Ordinal beta",
  convergence = m5$sdr$pdHess,
  runtime = time_ord[["elapsed"]]
)
         model convergence runtime
1 Ordinal beta        TRUE   127.9
Model diagnostics

Look at residual diagnostic plots:

Code
res_ord <- DHARMa::simulateResiduals(fittedModel = m5)
plot(res_ord)

Model estimates

Now look at the difference in the proportion of carotenoid colour between females and males from the ordinal beta model (m5), and plot the model-based estimate with its 95% confidence interval on the response scale.

Code
emm_m5 <- emmeans(m5, ~ sex, type = "response")

# pairwise contrast: Female vs Male
m5_sex_OR <- contrast(emm_m5, method = "pairwise") |>
  summary(infer = TRUE, type = "response")

# get summary with CI
sex_OR <- as.data.frame(m5_sex_OR) |>
  rename(OR = odds.ratio, lower.CL = asymp.LCL, upper.CL = asymp.UCL)

sex_OR
 contrast          OR     SE  df lower.CL upper.CL null z.ratio p.value
 female / male 0.9169 0.0246 Inf   0.8699   0.9664    1  -3.235  0.0012

Confidence level used: 0.95 
Intervals are back-transformed from the log odds ratio scale 
Tests are performed on the log odds ratio scale 

4.3 Case study 2: Evolution of plant hydraulic traits

We re-analysed the published study of Sanchez-Martinez et al. (2020) on the evolution of plant hydraulic traits using the phylogenetic generalized linear mixed models (PGLMMs) framework. The original study used a Bayesian MCMC approach to fit the models, but here we will use the glmmTMB packages to fit the models and assess different sampling distributions. NOTE: or just compare with MCMCglmm and show that glmmTMB is faster and has added functionality of gamma?

(log gaussian + gamma + MCMCglmm )

Data overview

Code
# Load plant hydraulic traits dataset
hydra.dat <- read_csv("data/HydraEvol2020.csv")

# Have a look at distribution of traits
hist(hydra.dat$Ks, breaks = 50, main = "Distribution of Ks", xlab = "Ks")

Code
hist(hydra.dat$P50, breaks = 50, main = "Distribution of P50", xlab = "P50")

Phylogenetic tree

Code
# Load bird tree (consensus tree = "combined tree")
plant.tree <- ape::read.tree("data/genus-level_phylogeny.tre")
# plot tree
plotTree(plant.tree, ftype="i", fsize=0.4, lwd=1, type="fan")

Models

First let’s set up the models for glmmTMB: - Try gamma model - Log(response) gaussian

Code
# log_Ks ~ animal + genus.Rand
# log_negP50 ~ animal + genus.Rand
# log_Hv ~ animal + genus.Rand
# log_negMinWP_md ~ animal + genus.Rand

Same models with MCMCglmm

Compare model outputs + run time

Model results

5 References

Brooks, M. E., Kristensen, K., van Benthem, K. J., Magnusson, A., Berg, C. W., Nielsen, A., Skaug, H. J.,Machler, M., & Bolker, B. M. (2017). glmmTMB balances speed and flexibility among packages for zero-inflated generalized linear mixed modeling. R Journal, 9 (2), 378–400. https://doi.org/10.32614/RJ-2017-066

Peter O. Dunn et al. ,Natural and sexual selection act on different axes of variation in avian plumage color.Sci. Adv.1,e1400155(2015).DOI:10.1126/sciadv.1400155

Hadfield, J. D. (2024, May). MCMCglmm: MCMC Generalised Linear Mixed Models. Retrieved October 7, 2024, from https://cran.r-project.org/web/packages/MCMCglmm/index.html

Montgomerie R. 2006. Analyzing colors. In Hill, G.E, and McGraw, K.J., eds. Bird Coloration. Volume 1 Mechanisms and measurements. Harvard University Press, Cambridge, Massachusetts.

Sanchez-Martinez, P., Martinez-Vilalta, J., Dexter, K. G., Segovia, R. A., & Mencuccini, M. (2020). Adaptation and coordinated evolution of plant hydraulic traits. Ecology Letters, 23 (11), 1599–1610. https://doi.org/10.1111/ele.13584

6 Session information

Code
library(sessioninfo)
library(details)

si <- session_info()
si$packages <- si$packages 
  # |> filter(package %in% c("metafor", "ape", "clubSandwich", "Matrix", "corpcor", "dplyr", "kableExtra", "xtable", "rotl", "Hmisc", "lattice"))

details(si, summary = 'Current session info', open = FALSE)
Current session info

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.4.2 (2024-10-31 ucrt)
 os       Windows 11 x64 (build 26100)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  English_Australia.utf8
 ctype    English_Australia.utf8
 tz       America/Denver
 date     2025-08-14
 pandoc   3.4 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 ! package           * version    date (UTC) lib source
   abind               1.4-8      2024-09-12 [1] CRAN (R 4.4.1)
   ape               * 5.8-1      2024-12-16 [1] CRAN (R 4.4.2)
   backports           1.5.0      2024-05-23 [1] CRAN (R 4.4.0)
   bayesplot           1.11.1     2024-02-15 [1] CRAN (R 4.4.1)
   bayestestR        * 0.15.0     2024-10-17 [1] CRAN (R 4.4.2)
   bit                 4.5.0.1    2024-12-03 [1] CRAN (R 4.4.2)
   bit64               4.5.2      2024-09-22 [1] CRAN (R 4.4.1)
   boot                1.3-31     2024-08-28 [1] CRAN (R 4.4.1)
   bridgesampling      1.1-2      2021-04-16 [1] CRAN (R 4.4.1)
   brms              * 2.22.0     2024-09-23 [1] CRAN (R 4.4.1)
   Brobdingnag         1.2-9      2022-10-19 [1] CRAN (R 4.4.1)
   broom               1.0.7      2024-09-26 [1] CRAN (R 4.4.1)
   broom.mixed       * 0.2.9.6    2024-10-15 [1] CRAN (R 4.4.2)
   cachem              1.1.0      2024-05-16 [1] CRAN (R 4.4.1)
   cellranger          1.1.0      2016-07-27 [1] CRAN (R 4.4.1)
   checkmate           2.3.2      2024-07-29 [1] CRAN (R 4.4.1)
   class               7.3-23     2025-01-01 [1] CRAN (R 4.4.2)
   classInt            0.4-11     2025-01-08 [1] CRAN (R 4.4.1)
   cli                 3.6.3      2024-06-21 [1] CRAN (R 4.4.1)
   clipr               0.8.0      2022-02-22 [1] CRAN (R 4.4.1)
   cluster             2.1.8      2024-12-11 [1] CRAN (R 4.4.2)
   clusterGeneration   1.3.8      2023-08-16 [1] CRAN (R 4.4.1)
   coda              * 0.19-4.1   2024-01-31 [1] CRAN (R 4.4.3)
   codetools           0.2-20     2024-03-31 [2] CRAN (R 4.4.2)
   colorspace          2.1-1      2024-07-26 [1] CRAN (R 4.4.1)
   combinat            0.0-8      2012-10-29 [1] CRAN (R 4.4.0)
   corpcor             1.6.10     2021-09-16 [1] CRAN (R 4.4.0)
   crayon              1.5.3      2024-06-20 [1] CRAN (R 4.4.1)
   cubature            2.1.1      2024-07-14 [1] CRAN (R 4.4.1)
   data.table        * 1.16.4     2024-12-06 [1] CRAN (R 4.4.2)
   DBI                 1.2.3      2024-06-02 [1] CRAN (R 4.4.1)
   DEoptim             2.2-8      2022-11-11 [1] CRAN (R 4.4.1)
   desc                1.4.3      2023-12-10 [1] CRAN (R 4.4.1)
   deSolve             1.40       2023-11-27 [1] CRAN (R 4.4.1)
   details           * 0.3.0      2022-03-27 [1] CRAN (R 4.4.1)
   devtools          * 2.4.5      2022-10-11 [1] CRAN (R 4.4.2)
   DHARMa            * 0.4.7      2024-10-18 [1] CRAN (R 4.4.3)
   digest              0.6.37     2024-08-19 [1] CRAN (R 4.4.1)
   distributional      0.5.0      2024-09-17 [1] CRAN (R 4.4.1)
   doParallel          1.0.17     2022-02-07 [1] CRAN (R 4.4.1)
   dplyr             * 1.1.4      2023-11-17 [1] CRAN (R 4.4.1)
   e1071               1.7-16     2024-09-16 [1] CRAN (R 4.4.2)
   ellipsis            0.3.2      2021-04-29 [1] CRAN (R 4.4.1)
   emmeans           * 1.10.6     2024-12-12 [1] CRAN (R 4.4.2)
   estimability        1.5.1      2024-05-12 [1] CRAN (R 4.4.1)
   evaluate            1.0.3      2025-01-10 [1] CRAN (R 4.4.2)
   expm                1.0-0      2024-08-19 [1] CRAN (R 4.4.1)
   farver              2.1.2      2024-05-13 [1] CRAN (R 4.4.1)
   fastmap             1.2.0      2024-05-15 [1] CRAN (R 4.4.1)
   fastmatch           1.1-6      2024-12-23 [1] CRAN (R 4.4.2)
   fmesher             0.2.0      2024-11-06 [1] CRAN (R 4.4.2)
   forcats           * 1.0.0      2023-01-29 [1] CRAN (R 4.4.1)
   foreach             1.5.2      2022-02-02 [1] CRAN (R 4.4.1)
   fs                  1.6.5      2024-10-30 [1] CRAN (R 4.4.2)
   furrr               0.3.1      2022-08-15 [1] CRAN (R 4.4.1)
   future              1.34.0     2024-07-29 [1] CRAN (R 4.4.1)
   future.apply        1.11.3     2024-10-27 [1] CRAN (R 4.4.2)
   geiger            * 2.0.11     2023-04-03 [1] CRAN (R 4.4.1)
   generics            0.1.3      2022-07-05 [1] CRAN (R 4.4.1)
   geometry            0.5.0      2024-08-31 [1] CRAN (R 4.4.1)
   ggplot2           * 3.5.1      2024-04-23 [1] CRAN (R 4.4.1)
   glmmTMB           * 1.1.11     2025-04-03 [1] Github (coraliewilliams/glmmTMB@5872941)
   globals             0.16.3     2024-03-08 [1] CRAN (R 4.4.0)
   glue                1.8.0      2024-09-30 [1] CRAN (R 4.4.2)
   gridExtra           2.3        2017-09-09 [1] CRAN (R 4.4.1)
   gtable              0.3.6      2024-10-25 [1] CRAN (R 4.4.2)
   here              * 1.0.1      2020-12-13 [1] CRAN (R 4.4.1)
   hms                 1.1.3      2023-03-21 [1] CRAN (R 4.4.1)
   htmltools           0.5.8.1    2024-04-04 [1] CRAN (R 4.4.1)
   htmlwidgets         1.6.4      2023-12-06 [1] CRAN (R 4.4.1)
   httpuv              1.6.15     2024-03-26 [1] CRAN (R 4.4.1)
   httr                1.4.7      2023-08-15 [1] CRAN (R 4.4.1)
   igraph              2.1.3      2025-01-07 [1] CRAN (R 4.4.2)
   INLA              * 24.06.27   2024-06-27 [1] local
   inline              0.3.21     2025-01-09 [1] CRAN (R 4.4.1)
   insight             1.0.1      2025-01-10 [1] CRAN (R 4.4.1)
   iterators           1.0.14     2022-02-05 [1] CRAN (R 4.4.1)
   jsonlite            1.8.9      2024-09-20 [1] CRAN (R 4.4.1)
   KernSmooth          2.23-26    2025-01-01 [1] CRAN (R 4.4.2)
   knitr             * 1.49       2024-11-08 [1] CRAN (R 4.4.2)
   labeling            0.4.3      2023-08-29 [1] CRAN (R 4.4.0)
   later               1.4.1      2024-11-27 [1] CRAN (R 4.4.2)
   lattice             0.22-6     2024-03-20 [2] CRAN (R 4.4.2)
   lifecycle           1.0.4      2023-11-07 [1] CRAN (R 4.4.1)
   lightr              1.8.0      2024-12-01 [1] CRAN (R 4.4.2)
   listenv             0.9.1      2024-01-29 [1] CRAN (R 4.4.1)
   lme4                1.1-37     2025-03-26 [1] CRAN (R 4.4.2)
   loo                 2.8.0      2024-07-03 [1] CRAN (R 4.4.1)
   lubridate         * 1.9.4      2024-12-08 [1] CRAN (R 4.4.2)
   magic               1.6-1      2022-11-16 [1] CRAN (R 4.4.1)
   magick              2.8.5      2024-09-20 [1] CRAN (R 4.4.1)
   magrittr            2.0.3      2022-03-30 [1] CRAN (R 4.4.1)
   maps              * 3.4.2.1    2024-11-10 [1] CRAN (R 4.4.2)
   MASS              * 7.3-64     2025-01-04 [1] CRAN (R 4.4.2)
   Matrix            * 1.7-1      2024-10-18 [1] CRAN (R 4.4.2)
   matrixStats         1.5.0      2025-01-07 [1] CRAN (R 4.4.2)
   MCMCglmm          * 2.36       2024-05-06 [1] CRAN (R 4.4.1)
   memoise             2.0.1      2021-11-26 [1] CRAN (R 4.4.1)
   mgcv                1.9-1      2023-12-21 [2] CRAN (R 4.4.2)
   mime                0.12       2021-09-28 [1] CRAN (R 4.4.0)
   miniUI              0.1.1.1    2018-05-18 [1] CRAN (R 4.4.1)
   minqa               1.2.8      2024-08-17 [1] CRAN (R 4.4.1)
   misc3d              0.9-1      2021-10-07 [1] CRAN (R 4.4.1)
   mnormt              2.1.1      2022-09-26 [1] CRAN (R 4.4.0)
   multcomp            1.4-26     2024-07-18 [1] CRAN (R 4.4.2)
   munsell             0.5.1      2024-04-01 [1] CRAN (R 4.4.1)
   mvtnorm             1.3-2      2024-11-04 [1] CRAN (R 4.4.2)
   nlme                3.1-166    2024-08-14 [1] CRAN (R 4.4.1)
   nloptr              2.2.1      2025-03-17 [1] CRAN (R 4.4.3)
   numDeriv            2016.8-1.1 2019-06-06 [1] CRAN (R 4.4.0)
   optimParallel       1.0-2      2021-02-11 [1] CRAN (R 4.4.1)
   pacman              0.5.1      2019-03-11 [1] CRAN (R 4.4.1)
   parallelly          1.41.0     2024-12-18 [1] CRAN (R 4.4.2)
   pavo              * 2.9.0      2023-09-24 [1] CRAN (R 4.4.1)
   performance       * 0.12.4     2024-10-18 [1] CRAN (R 4.4.2)
   phangorn            2.12.1     2024-09-17 [1] CRAN (R 4.4.1)
   phyr              * 1.1.0      2020-12-18 [1] CRAN (R 4.4.1)
   phytools          * 2.4-4      2025-01-08 [1] CRAN (R 4.4.2)
   pillar              1.10.1     2025-01-07 [1] CRAN (R 4.4.2)
   pkgbuild            1.4.6      2025-01-16 [1] CRAN (R 4.4.1)
   pkgconfig           2.0.3      2019-09-22 [1] CRAN (R 4.4.1)
   pkgload             1.4.0      2024-06-28 [1] CRAN (R 4.4.1)
   plot3D              1.4.1      2024-02-06 [1] CRAN (R 4.4.1)
   png                 0.1-8      2022-11-29 [1] CRAN (R 4.4.0)
   posterior           1.6.1      2025-02-27 [1] CRAN (R 4.4.3)
   profvis             0.4.0      2024-09-20 [1] CRAN (R 4.4.1)
   progressr           0.15.1     2024-11-22 [1] CRAN (R 4.4.2)
   promises            1.3.2      2024-11-28 [1] CRAN (R 4.4.2)
   proxy               0.4-27     2022-06-09 [1] CRAN (R 4.4.1)
   purrr             * 1.0.2      2023-08-10 [1] CRAN (R 4.4.1)
   quadprog            1.5-8      2019-11-20 [1] CRAN (R 4.4.0)
   QuickJSR            1.5.1      2025-01-08 [1] CRAN (R 4.4.2)
   R6                  2.6.1      2025-02-15 [1] CRAN (R 4.4.2)
   rbenchmark        * 1.0.0      2012-08-30 [1] CRAN (R 4.4.0)
   rbibutils           2.3        2024-10-04 [1] CRAN (R 4.4.1)
   Rcpp              * 1.0.14     2025-01-12 [1] CRAN (R 4.4.2)
 D RcppParallel        5.1.9      2024-08-19 [1] CRAN (R 4.4.1)
   Rdpack              2.6.3      2025-03-16 [1] CRAN (R 4.4.3)
   readr             * 2.1.5      2024-01-10 [1] CRAN (R 4.4.1)
   readxl            * 1.4.3      2023-07-06 [1] CRAN (R 4.4.1)
   reformulas          0.4.0      2024-11-03 [1] CRAN (R 4.4.2)
   remotes             2.5.0      2024-03-17 [1] CRAN (R 4.4.1)
   rlang               1.1.5      2025-01-17 [1] CRAN (R 4.4.1)
   rmarkdown         * 2.29       2024-11-04 [1] CRAN (R 4.4.2)
   rprojroot           2.0.4      2023-11-05 [1] CRAN (R 4.4.1)
   rstan               2.32.6     2024-03-05 [1] CRAN (R 4.4.1)
   rstantools          2.4.0      2024-01-31 [1] CRAN (R 4.4.1)
   rstudioapi          0.17.1     2024-10-22 [1] CRAN (R 4.4.2)
   sandwich            3.1-1      2024-09-15 [1] CRAN (R 4.4.1)
   scales              1.3.0      2023-11-28 [1] CRAN (R 4.4.1)
   scatterplot3d       0.3-44     2023-05-05 [1] CRAN (R 4.4.0)
   sessioninfo       * 1.2.2      2021-12-06 [1] CRAN (R 4.4.1)
   sf                  1.0-19     2024-11-05 [1] CRAN (R 4.4.2)
   shiny               1.10.0     2024-12-14 [1] CRAN (R 4.4.2)
   sp                  2.1-4      2024-04-30 [1] CRAN (R 4.4.1)
   splitstackshape   * 1.4.8      2019-04-21 [1] CRAN (R 4.4.1)
   StanHeaders         2.32.10    2024-07-15 [1] CRAN (R 4.4.1)
   stringi             1.8.4      2024-05-06 [1] CRAN (R 4.4.0)
   stringr           * 1.5.1      2023-11-14 [1] CRAN (R 4.4.1)
   subplex             1.9        2024-07-05 [1] CRAN (R 4.4.1)
   survival            3.8-3      2024-12-17 [1] CRAN (R 4.4.2)
   tensorA             0.36.2.1   2023-12-13 [1] CRAN (R 4.4.0)
   TH.data             1.1-3      2025-01-17 [1] CRAN (R 4.4.2)
   tibble            * 3.2.1      2023-03-20 [1] CRAN (R 4.4.1)
   tidyr             * 1.3.1      2024-01-24 [1] CRAN (R 4.4.1)
   tidyselect          1.2.1      2024-03-11 [1] CRAN (R 4.4.1)
   tidyverse         * 2.0.0      2023-02-22 [1] CRAN (R 4.4.2)
   timechange          0.3.0      2024-01-18 [1] CRAN (R 4.4.1)
 D TMB                 1.9.17     2025-03-10 [1] CRAN (R 4.4.3)
   tzdb                0.4.0      2023-05-12 [1] CRAN (R 4.4.1)
   units               0.8-5      2023-11-28 [1] CRAN (R 4.4.1)
   urlchecker          1.0.1      2021-11-30 [1] CRAN (R 4.4.1)
   usethis           * 3.1.0      2024-11-26 [1] CRAN (R 4.4.2)
   vctrs               0.6.5      2023-12-01 [1] CRAN (R 4.4.1)
   viridisLite         0.4.2      2023-05-02 [1] CRAN (R 4.4.1)
   vroom             * 1.6.5      2023-12-05 [1] CRAN (R 4.4.1)
   withr               3.0.2      2024-10-28 [1] CRAN (R 4.4.2)
   xfun                0.50       2025-01-07 [1] CRAN (R 4.4.2)
   xml2                1.3.6      2023-12-04 [1] CRAN (R 4.4.1)
   xtable              1.8-4      2019-04-21 [1] CRAN (R 4.4.2)
   yaml                2.3.10     2024-07-26 [1] CRAN (R 4.4.1)
   zoo                 1.8-12     2023-04-13 [1] CRAN (R 4.4.1)

 [1] C:/Users/z5394590/AppData/Local/R/win-library/4.4
 [2] C:/Program Files/R/R-4.4.2/library

 D ── DLL MD5 mismatch, broken installation.

──────────────────────────────────────────────────────────────────────────────